Encoding Distributional Semantics into Triple-Based Knowledge Ranking for Document Enrichment
نویسندگان
چکیده
Document enrichment focuses on retrieving relevant knowledge from external resources, which is essential because text is generally replete with gaps. Since conventional work primarily relies on special resources, we instead use triples of Subject, Predicate, Object as knowledge and incorporate distributional semantics to rank them. Our model first extracts these triples automatically from raw text and converts them into real-valued vectors based on the word semantics captured by Latent Dirichlet Allocation. We then represent these triples, together with the source document that is to be enriched, as a graph of triples, and adopt a global iterative algorithm to propagate relevance weight from source document to these triples so as to select the most relevant ones. Evaluated as a ranking problem, our model significantly outperforms multiple strong baselines. Moreover, we conduct a task-based evaluation by incorporating these triples as additional features into document classification and enhances the performance by 3.02%.
منابع مشابه
Triple based Background Knowledge Ranking for Document Enrichment
Document enrichment is the task of retrieving additional knowledge from external resource over what is available through source document. This task is essential because of the phenomenon that text is generally replete with gaps and ellipses since authors assume a certain amount of background knowledge. The recovery of these gaps is intuitively useful for better understanding of document. Conven...
متن کاملToward a Deep Neural Approach for Knowledge-Based IR
This paper tackles the problem of the semantic gap between a document and a query within an ad-hoc information retrieval task. In this context, knowledge bases (KBs) have already been acknowledged as valuable means since they allow the representation of explicit relations between entities. However, they do not necessarily represent implicit relations that could be hidden in a corpora. This latt...
متن کاملDistributional and Neural Models for Extracting Manipulation-Relevant Relations from Text Corpora
In this paper we present a novel approach based on neural network techniques to extract common sense knowledge from text corpora. We apply this approach to extract of common sense knowledge about everyday objects that can be used by intelligent machines, e.g. robots, to support planning of tasks that involve object manipulation. The knowledge we extract is constituted by relations that relate s...
متن کاملText-Based Ontology Enrichment Using Hierarchical Self-organizing Maps
The success of the Semantic Web research is dependent upon the construction of complete and reliable domain ontologies. In this paper we describe an unsupervised framework for domain ontology enrichment based on mining domain text corpora. Specifically, we enrich the hierarchical backbone of an existing ontology, i.e. its taxonomy, with new domain-specific concepts. The framework is based on an...
متن کاملNovel Ranking-Based Lexical Similarity Measure for Word Embedding
Distributional semantics models derive word space from linguistic items in context. Meaning is obtained by defining a distance measure between vectors corresponding to lexical entities. Such vectors present several problems. In this paper we provide a guideline for post process improvements to the baseline vectors. We focus on refining the similarity aspect, address imperfections of the model b...
متن کامل